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We explore in detail the structural, mechanical and thermodynamic properties of a coarse-grained 
model of DNA similar to that introduced in Ref. (T] Effective interactions are used to represent chain 
connectivity, excluded volume, base stacking and hydrogen bonding, naturally reproducing a range 
of DNA behaviour. We quantify the relation to experiment of the thermodynamics of single-stranded 
stacking, duplex hybridization and hairpin formation, as well as structural properties such as the 
persistence length of single strands and duplexes, and the torsional and stretching stiffness of double 
helices. We also explore the model's representation of more complex motifs involving dangling ends, 
bulged bases and internal loops, and the effect of stacking and fraying on the thermodynamics of 
the duplex formation transition. 
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I. INTRODUCTION 

A single-stranded molecule of DNA (ssDNA) con- 
sists of a chain of alternating sugar and phosphate 
groups.121 Attached to each sugar is a base, alanine (A), 
thymine (T), cytosine (C) or guanine (G). Bases are in- 
herently planar, and their tendency to from coplanar 
stacks and undergo hydrogen-bonding leads to the forma- 
tion of double-stranded helices (dsDNA). The canonical 
Watson-Crick base pairs (bp), C-G and A-T, are called 
complementary base pairs because they form the most 
stable hydrogen bonds. 

The different base identities, along with the rules of 
complementarity, allow information to be encoded into 
the single strands.'^ In nature, this allows both strands of 
a double helix to carry the genetic information required 
for life. Recently, this information-carrying property has 
been harnessed in nanotechnology. A set of single strands 
can be designed with a pattern of complementarity that 
specifies a certain 2- or 3-dimensional structure (usually 
formed from branched double-helices) as the global free- 
energy minimum of the system. Strands can then be 
mixed and self-assemble, provided the sequences are well 
designed. When combined with its structural proper- 
ties (dsDNA is stiff on the nanoscale, with a persistence 
length of around 50 nm or 150 bp,Sl and ssDNA has the 
flexibility to act as hinges between duplex sections), such 
selective interactions make DNA an ideal material for 
nanoscale self-assembly. 

The self-assembly of short strands (oligonucleotides) 
was first demonstrated by the Seeman lab, who cre- 
ated a four-armed junctionP Junctions of this type, 
and more complex motifs,'^ have been used to create 
lattices® ^ and ribbons.!^ 3-dimensional structures have 
also been realized: initially, the Seeman group con- 
structed a cub^^and a truncated octahedrorpilin several 
discrete stages. Polyhedral cages that rapidly form as 
solutions of olig onucleotides are cooled have since been 
developed.'iSHi^ These examples illustrate the potential 



of DNA as a material for controllable nanoscale self- 
assembly. 

An alternative approach to self-assembly, DNA 
origami, was recently developed by Rothemund.'i^ In 
this CclSC, Oil long single strand is folded into a desired 
structure by short "staple" strands, allowing the as- 
sembly of an enormous range of 2-dimensional struc- 
tures. This approach has been extended to three 
dimensions, either by linking together 2-dimensional 
sheets,'i3 or by using the twist of DNA to form inher- 
ently 3-dimensional folded strands.fi^ Additional meth- 
ods of 3-dimensional self-assembly are possible: self- 
interactions within a single strand have been used to 
create a tetrahedronj^ and other structures have been 
created from pre-assembled comp onents involving DNA 
and other organic molecules .^lESl 

DNA nanotechnology is not limited to the self- 
assembly of static structures, as hybridization can also 
be used to drive nanodevices.-"^ Such devices typically 
undergo structural changes due to duplex formation or 
toehold-mediated strand displacement (wherein a strand 
in a partially formed duplex is replaced by a strand which 
can form a more complete duplex). The earliest de- 
signs, such as the "tweezers" of Yurke et a/. ,1231 required 
sequential addition of single strands to force a sy stem 
through a conformational cycle, or along a track.'^SMl 
The use of enzyme-facilitated hydrolysis,'^ or fuel in 
metastable states such as single-stranded hairpins,^® has 
allowed the design of autonomous devices. The selectiv- 
ity of DNA binding has also been used to perform simple 
logic operations, offering the potential for "intelligent" 
nanostructures or devices, which respond to certain fea- 
tures of their environment. 

As discussed above, much of DNA nanotechnology re- 
lies either largely or entirely upon B-DNA duplex hy- 
bridization from single strands (although other transi- 
tions can be exploited, such as the formation of single- 
stranded "i motif structureJ^. Furthermore, some bio- 
logically relevant behaviour (such as the opening of tran- 
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sient "bubbles" (stretches of broken bps) within helices 
and the extrusion of cruciform structures in negatively 
supercoiled DNAI^ relies primarily on the properties of 
single and double strands, and the competition between 
the two. 

Information about the intermediate states in assembly 
processes, which are often difficult to resolve in experi- 
ment yet crucial to the processes as a whole, would aid 
the design of nanostructures and nanotechnology. Com- 
puter modelling, provided it can capture the transition 
between single- and double-stranded DNA, has the po- 
tential to offer significant insight into these systems. 

At the most detailed level, atomistic simulations using 
force fields such as AMBER or CHARM offer an intimate 
representation of DNA.'^^ A large-scale systematic study 
of the structural properties of short sequences as repre- 
sented by AMBER has been carried out by the Ascona 
B-DNA Consortium.^ Unfortunately, the number of de- 
grees of freedom (including those of the solvating H2O 
molecules) prohibits the simulation of large molecules for 
long periods of time. For example, simulations of dou- 
ble helices (on the scale of 10-20 base pairs) have only 
recently been extended to time scales of ~ 1 /j.s.'^^^ The 
use of enhanced sampling techniques has given atomistic 
simulations some access to hybridization transitions in 
the smallest duplexes'^^ and hairpins, ■^^ although larger 
systems remain prohibitively expensive to model. 

At the other end of the spectrum, continuum models 
of DNA'221 treat the double helix as a uniform medium. 
Whilst these approaches can provide important insight 
into DNA behaviour on long length-scales, they are not 
constructed to deal with the details of processes involving 
duplex hybridization or melting. 

To gain further insight into hybridization, coarse- 
grained models, which represent DNA through a reduced 
set of degrees of freedom with effective interactions, are 
required. In particular, models whose coarse-grained 
scale is approximately that of the nucleotide may pro- 
vide the necessary compromise between resolution and 
computational speed. 

The simplest available coarse-grained models are sta- 
tistical, neglecting structural and dynamical detail. 
These models use sequence-dependent parameters that 
describe the free-energy gain per base pair relative to 
the denatured state, with extra parameters used for ini- 
tialization of duplex regions and to describe unpaired 
sections within the a structure. Among the most pop- 
ular ar e the Poland-ScheragcP^l and nearest-neighbour 
mo dels ,1^1^ generally used in the context of polynu- 
cleotide and oligonucletide melting, respectively. A 
particularly important variant of the nearest-neighbour 
model, which has been shown to reproduce experimental 
melting temperatures of duplexes ranging from 4-16 bp 
in length with a standard devia tion o f 2.3 K, was intro- 
duced by SantaLucia and Hicks.'SESl in this model, the 
concentrations of oligonucletides A and B, and their du- 



plex AB, are given by: 

= exp ( - (3{AHab - TASab)), (1) 

where the constants AHab and ASab are computed by 
summing contributions from each nearest-neighbour set 
of two base pairs, together with terms for helix initiation 
and various structural features, all of which are assumed 
to be temperature independent. Such a description, in 
which AHab and ASab are temperature independent, 
constitutes a "two-state" model. 

Alternatives to these purely statistical models have 
also been proposed. Everaers et al^ have suggested a 
lattice model of DNA explicitly designed to unify nearest- 
neighbour and Poland-Scheraga models, with the added 
advantage that some structural information is also pre- 
served. Peyrard-Bishop-Dauxois models*^ represent base 
pairs through a continuous 1-dimensional coordinate, al- 
lowing dynamical simulations of denaturation bubbles in 
polynucleotide DNA. None of the models discussed, how- 
ever, provide a sufficiently sophisticated representation 
of the three-dimensional structure and dynamics of DNA 
to allow the detailed study of the transitions involved in 
nanotechnology. 

To study the processes involved in nucleic acid struc- 
ture formation, a fully 3-dimensional, dynamical, coarse- 
grained model is required. "Rigid base-pair" models, in 
which undeformable base pairs are the fundamental unit, 
have been used to study perturbations to DNA such as 
those induced by enzymes."^ By definition, such mod- 
els cannot represent the transition from single strands to 
duplexes, and hence are inappropriate for the study of as- 
sembly processes. Lankas et aZ.!^ directly compared rigid 
base-pair and rigid base models that were parameterized 
to reproduce positional time-series that were generated 
from atomistic simulations of B-DNA. Interestingly, they 
found that the rigid base models, in which the base pairs 
are deformable and nucleotides are the essential unit of 
simulation, generated a more local representation of the 
interactions than rigid base-pair models did, suggesting 
that the base-pairs are a more appropriate level of de- 
scription for structural and mechanical properties of B- 
DNA. 

Several rigid base models, and others in which each nu- 
cleotide has stiff internal degrees of freedom, have been 
proposed in the last decade. These models represent 
nucleotides by several interaction sites, and can be di- 
vided into two kinds. Firstly, some modellers param- 
eterize their effective force fields by direct comparison 
with either atomistic simulations or data from crystal 
structures. An alternative is to take a more heuristic 
approach, designing force fields to provide a reasonable 
description of a range of large-scale properties (such as 
melting temperatures of helices) when compared to ex- 
periment: these two approaches could be described as 
"bottom-up" and "top-down", respectively. 

Bottom-up approaches have been used to study RNA 
nanostructures the response of DNA minicircles to 
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supercoilingpSEH the behaviour of B-DNA over a range 
of conditionsj^ binding of DNA to the nucleosom^D 
and the properties of the resultant model as a function 
of parameterization.E^ Although systematically coarse- 
graining removes some of the arbitrary choices in design- 
ing a minimal model, there are drawbacks. Firstly, the 
resultant force-field will be biased towards the structures 
with which it was parameterized: in particular, equi- 
librium duplex structures are often the primary source 
of information, and hence single-stranded behaviour is 
not necessarily well reproduced. Perhaps more signifi- 
cantly, the transition between ssDNA and dsDNA may 
be poorly represented: indeed, none of the bottom-up 
approaches described above have been used to investi- 
gate melting transitions in a rigorous way, with the focus 
being largely on structural properties. Secondly, "repre- 
sentability problems"!^ mean that careful fitting to dis- 
tribution functions will not necessarily reproduce ther- 
modynamic properties in a reliable fashion;-^ Finally, it 
is not yet known how accurate atomistic simulations are 
in reproducing the duplex hybridization transition. 

All coarse-grained models represent a compromise, and 
an appropriate model must be chosen for the investi- 
gation at hand. Current examples of bottom-up ap- 
proaches are well-suited to studying fluctuations in the 
vicinity of the equilibrium structure in question. By 
contrast, top-down approaches appear to lend them- 
selves to the study of larger changes, particularly assem- 
bly transitions. Top-down approaches have been used 
to study duplex denaturation,!^ hairpin formation ,'^^1^ 
RNA foldin^SM! and mechanical unfolding, Isnill Holli- 
day junction formation,'^ duplex thermodynamicd^^'^ 
and overstretching.'SSl 

For this paper we are mainly concerned with develop- 
ing a model that can treat the formation of complexes 
involving single strands and B-DNA, with the particu- 
lar goal of describing processes that are relevant to the 
self-assembly of DNA nanostructures and the dynamics 
of nanodevices,!^ but also with a view towards biological 
applications. We thus require a good representation of 
the structural, mechanical and thermodynamic proper- 
ties of both single and double stranded DNA. 

An important property to reproduce is the tendency 
of consecutive bases tend to form coplanar stacks, with 
an average separation of about 3.4 A,!^, which is shorter 
than the equilibrium separation of phosphates (along the 
backbone) of approximately 6.5 A. The difference be- 
tween the two length-scales helps determine the shape 
of B-DNA, which forms a double helix to exploit the 
stacking interactions. Helicity can also play an impor- 
tant role in the kinetics of assembly, in particular leading 
to frustration of bonding when strands are topologically 
constrained.'^ 

The two length-scales also mean that single strands are 
ordered in a helical structure at low temperatures. At 
higher temperatures, where entropy dom inates, they are 
disordered and significantly less stiff!^"^^' Such unstacked 
strands are extremely flexible relative to duplexes, per- 



mitting the formation of DNA structures which involve 
sharply bent single-stranded regions, such as hairpins. 
Furthermore, stacking has significant consequences for 
the thermodynamics and kinetics of assembly (the role 
of stacking in the thermodynamics of duplex formation 
is discussed in Section III B 3 1. 



For complex assembly processes involving several inter- 
actions, it is important not only to correctly reproduce 
properties like melting temperatures, but also the exper- 
imentally measured transition widths so that certain fea- 
tures such as hierarchical assembly are preserved. More 
generally, the widths of the transitions determine the re- 
sponse of melting temperatures to concentration changes 
(Section III B 2 ). Finally, a reasonable representation of 
the elastic properties of DNA is important if the model is 
to be used to study systems involving DNA under stress, 
such as minicircles.'^Sl 



Whereas the many other top-down models in the litera- 
ture each have their strengths and weaknesses, we believe 
that none are currently optimized for the particular suite 
of properties that we desire to accurately reproduce. For 
example, most have either ignored the stacking transi- 
tion of single strandj^ ^ * ^'' * ^^ ! or enforced helicity largely 
through dihedral and angular potentials imposed on the 
backbone of a single strand. ^'"^ 63-65 addition, where it 
was considered, the melting transition in previous mod- 
els was g enerally significantly wider than experimentally 
reported.'^^'^^Hm In Ref. [T]we briefly introduced a model 
designed to represent ssDNA, B-DNA and the transition 
between them, and demonstrated its utility for nanode- 
vices by simulating a full cycle of DNA tweezers .^^^ We 
should note that the model is fitted at a fixed salt concen- 
tration, and does not distinguish between the strength of 
A-T and C-G base pairs. 



The aim of the current paper is to give a detailed de- 
scription of our modeling approach. In Sectionjllj we 
present a slightly modified version of the model that 
appeared in Ref. [TJ and discuss its philosophy, param- 
eterization and simulation. The model's representa- 

We 



tion of DNA behaviour is presented in Section III 



first discuss model DNA structure and thermodynamics 



(Sections III A & III B), before considering it's mechani- 



cal properties (Section III C I and the representation of 
certain motifs such as hairpins (Section HID). Finally, 



we include an extensive discussion of the strengths and 
weaknesses of our approach in Section |IV| The support- 
ing appendices include a detailed representation of our 
model potential (Appendix |A]), a statistical model for 
stacking (Appendix [b]) and a statistical model for du- 
plex formation that explicitly accounts for the effects of 
stacking and fraying (Appendix [C| . 
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FIG. 1; (a) Model interaction sites. For clarity, tlie 
stacking/hydrogen-bonding sites are shown on one nucleotide and 
the base excluded volume on the other. The sizes of the spheres 
correspond to interaction ranges: two repulsive sites interact with a 
Lennard-Jones cr (Appendix|Aj equal to the sum of the radii shown 
(note that the truncation and smoothing procedure extends the re- 
pulsion slightly beyond this distance ( Appendix[A]l ) . The distance 
at which hydrogen-bonding and stacking interactions are at their 
most negative is given by the diameter of the spheres. Visualiza- 
tion was found to be clearer with nucleotides depicted as in (b), 
with the subfigures (a) and (b) representing identical nucleotides 
on the same scale. The ellipsoidal bases allow a representation of 
the planarity inherent in the model, with the shortest axis corre- 
sponding to the base normal, (c) A 12 bp duplex as represented by 
the model. 



II. METHODS 
A. The model 

1. Philosophy of the model 

In designing a model, we have aimed to embed the 
thermodynamics of transitions involving ssDNA and ds- 
DNA (in the most common B-form) into a 3-dimensional, 
dynamical, coarse-grained representation that provides 
a reasonable representation of structural and thermo- 
dynamic properties. This ambition naturally coincides 
with a top-down approach. We are not primarily con- 
cerned with the chemical details of interactions, but 
rather their net effect with regard to the properties of 
DNA. In addition, we have attempted to capture these 
properties by using only pairwise excluded volume, back- 
bone connectivity, hydrogen-bonding, stacking and cross- 
stacking interactions (wi th no e xplicitly length- or loop 
size- dependent potential^SMMI). The model we present 
here is a slightly modified version of that which appeared 
in Ref. 1, with the changes improving the representation 
of dsDNA flexibility and making the potential continuous 
and differentiable, allowing simulation methods which re- 
quire forces, such as Langevin dynamics 

An additional consideration in model design is the need 
for computational efficiency (if assembly transitions of 



FIG. 2: Two possible configurations of a 9-base strand at 333 K. 
(a) All neighbours stacked to form a right-handed helix, (b) Most 
neighbours unstacked, giving a flexible, disordered strand. 



complex structures are to be simulated). In our model, 
all interactions are pairwise (i.e., only involve two nu- 
cleotides, which are taken as rigid bodies). This pairwise 
character allows us to make efficient use of cluster-move 
Monte Carlo (MC) algorithms, which facilitate relax- 
ation on all length-scales in a bound structure, and allow 
a much larger typical step size than possible in Langevin 
dynamics. 

Our model consists of rigid nucleotides, illustrated in 
Fig. [T] The three interaction sites lie in a line, with the 
base stacking and hydrogen-bonding/base excluded vol- 
ume sites separated from the backbone excluded volume 
site by 6.3 A and 6.8 A, respectively. The orientation of 
bases is specified by a normal vector, which gives the no- 
tional plane of the base: the relative angle of base planes 
is used to modulate interactions (rather than through the 
use of off-axis sites) . 



2. The potential 

In this section we present an overview of the poten- 
tial. Further details are given in Appendix|Aj Model nu- 
cleotides interact in a pairwise fashion with other nu- 
cleotides in the system. Interactions between nearest- 
neighbours (nn) on a strand are distinct from all others, 
allowing for strand connectivity and stacking. The po- 
tential can therefore be written as a sum over nn pairs, 
and a sum over all others: 



ackbone + Vstack + Vexc) 

nn 

+ ^ {VhB + Vc.stack + Vexc)- 
other pairs 



(2) 



Vbac 



kbo 



is a finitely extensible non-linear elastic 



(FENE) spring (see Appendix A]), with an equilibrium 
length of 6.4 A, representing the covalent bonds which 
hold nucleotides in a strand together. 

Vstack represents the tendency of bases to form copla- 
nar stacks: it is a smoothly cut-off Morse potential be- 
tween base-stacking sites, with a minimum at 3.4 A. It 



5 



is modulated by angular terms which favour the align- 
ment of normal vectors, and the alignment of the normal 
vectors with the vector between stacking sites. As such, 
the interaction encourages coplanar stacks, separated by 
a shorter distance than the equilibrium backbone length, 
leading to helical structures. Right-handed helices are 
imposed through an additional modulating factor which 
reduces the interaction to zero for increasing amounts of 
left-handed twist. 

Vexc and Vg^^, representing the excluded volume of 
nucleotides, prevent the crossing of chains and provide 
stiffness to unstacked single strands. The lack of ex- 
plicit angular or dihedral potentials along the backbone 
allows single strands to be extremely flexible. For non- 
nearest neighbours, smoothly cut-off (and purely repul- 
sive) Lennard- Jones interactions are included between all 
repulsion sites on the two nucleotides. For nearest neigh- 
bours, the backbone/backbone site interaction is not in- 
cluded because the distance between sites is regulated by 
the FENE spring. 

Vhb, representing the hydrogen bonds which lead to 
base pairing, is a smoothly cut-off Morse potential be- 
tween hydrogen-bonding sites, modulated by angular 
terms which favour the anti-alignment of normal vec- 
tors and a co-linear alignment of all four backbone and 
hydrogen-bonding sites. Vhb is set to zero unless the 
two bases are complementary (A-T or G-C). Together 
with Vstackj Vhb causes the formation of anti-parallel, 
right-handed double helices for complementary strands. 

Vc_stack represents cross-stacking interactions between 
a base in a base pair and nearest-neighbour bases on the 
opposi te str and, providing additional stabilization of the 
duplex .I^SlIil We incorporate it through smoothed, cut- 
off quadratic wells, modulated by the alignment of base 
normals and backbone-base vectors with the separation 
vector in such a way that its minimum is approximately 
consistent with the structure of model duplexes. 

Our model currently neglects some features of DNA. 
Although it incorporates sequence specificity (in that 
only A-T and C-G hydrogen bonds are possible), there 
is no sequence dependence in the potentials for either 
stacking, cross-stacking, hydrogen-bonding or excluded 
volume. We have made the simplifying assumption that 
non-complementary base pairs have zero attraction, and 
also neglected the possibility of alternative base-pair ge- 
ometries (such as HoogsteerP|).We also have no explicit 
electrostatic interaction in the model, which may be ex- 
pected to be important as bare ssDNA has a charge of 
— e per base. For this reason, we fit to experimental data 
(where possible) at [Na+] = 500 mM , where electrostatic 
properties are strongly screened. Indeed, at these ionic 
concentrations, the Debye screening length is approxi- 
mately 4.3 A, smaller than the excluded volume diameter 
for backbone-backbone interactions in our model ~ 6 A. 
At the shortest distances allowed by the steric interac- 
tions, charges would have an energy of ~ 2 kT in a Debye- 
Huckel approximation. Other authors h ave at tempted 
to explicitly include a Debye-Huckel term,'^216ll j-^^^ a,lso 



included a salt-dependent, medium-range attraction be- 
tween strands in monovalent salt to facilitate hybridiza- 
tion, the physical origin of which is unclear. 

Many of the simplifications in our model were made 
to reduce the number of possible parameters. For ex- 
ample, sequence dependence would give 16 combinations 
of stacking pairs, each pair requiring several parameters 
to describe their interaction. We also felt that, as an 
initial step in modelling, it was important to obtain a 
good physical representation of the underlying proper- 
ties of DNA assembly (such as the generic dependence 
of melting temperature on length), before we incorpo- 
rated sequence specific or low salt effects. Furthermore, 
some generic effects may be obscured by sequence-specific 
terms (for instance, free-energy profiles such as Figure 
[5] would have sequence-dependent fluctuations overlying 
the general trend). 



3. Parameterization of interactions 

Parameterizing such a model is a non-trivial process, 
as it involves a compromise between the representation of 
various aspects of DNA. In particular, a given parameter 
may influence a wide range of properties and it is difficult 
to design a simple metric to compare the reproduction of 
thermodynamic and mechanical DNA behavior. In our 
case, lengths were initially chosen to give our approxi- 
mate B-DNA geometry. Interaction strengths and widths 
were then altered to give a description of the thermody- 
namics of stacking and duplex formation close to those in 
Ref. [75] and Ref. |3H respectively (for comparison to Ref. 
HHwe used an 'average base pair' - see Section[ni B 2 1 . 



Finally, structural properties on long length-scales were 
checked, and widths of potentials and modulating factors 
adjusted, as potential width determines structural stiff- 
ness. Several iterations of this cycle were performed until 
the current parameter set was found . 

In general, the interaction energy in a coarse-grained 
model should be interpreted as a free energy, as it incor- 
porates a number of implicit degrees of freedom,!^ and 
thus it is plausible that interaction strengths could be 
temperature dependent. To reduce free parameters, we 
have avoided this temperature dependence except for the 
case of the the stacking strength. We found that it was 
difficult to design a stacking transition with an entropy 
as small as required (see Section III B I ) whilst maintain- 
ing an appropriate stiffness for dsDNA. Our stacking 
strength parameter has therefore been taken to be lin- 
early dependent on temperature (see Appendix[Aj over 
the range 270-370 K, the stacking strength increases by 
^ 6%), in effect reducing the entropy cost of the transi- 
tion. 

There are two main possible reasons why this tempera- 
ture dependence of the interaction parameters is required 
in our model. Firstly, it may be that it is an intrin- 
sic property of the stacking interaction. In particular, 
stacking is thought to be partially a result of hydrophobic 
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effectspEH and hence might be expected to be tempera- 
ture dependent in any model without expUcit water. Sec- 
ondly, it may be that the coarse-graining leads to an over- 
estimation of the entropy of the unstacked state relative 
to the stacked state, which then needs to be compensated 
by a temperature dependence in the interaction parame- 
ters. In particular, in order to replicate the flexibility of 
single strands, we impose no restriction on the conforma- 
tion of the backbone-backbone, backbone-base and base 
normal vectors except for excluded volume. This lack of 
constraints is certainly a significant simplification, and 
will allow some conformations that would likely be ex- 
cluded by specific steric clashes in a finer-grained model 
(such specific geometric effects would be exceedingly dif- 
ficult to reproduce in a bead-spring model such as ours) . 
We deem this likely overestimate of available configura- 
tions to be an acceptable price to pay for the flexibility 
of ssDNA necessary for hairpins and nanostructures. 



B. Simulation technique 

The results reported in this paper were obtained us- 
ing the Virtual-Move Monte-Carlo (VMMC) algorithm 
developed by Whitelam and Geissler^, which allows ef- 
flcient MC simulation of strongly bound systems. The 
algorithm takes a selected single-particle move, as with 
conventional MC algorithms, and then grows a cluster 
from connected particles according to energy changes as- 
sociated with the move. The algorithm combines col- 
lective motion with the large step sizes of MC (allowing 
quicker decorrelation and hence equilibration). 

The combination of coarse-graining and an efficient 
MC algorithm provides access to processes on long time 
scales. To indicate simulation efficiency, we considered 
the formation of a 4 bp duplex at its melting temperature, 
in a periodic box of side length 17 nm (effective concen- 
tration 0.34 mM). A recent study"^^ considered a similar 
system using an atomistic description with continuous 
solvent. In the atomistic case, sophisticated sampling 
techniques (replica exchange molecular dynamic and um- 
brella sampling) were required to provide data for the 
transition, which was the sole focus of the study. For our 
model, ^ 8 complete binding and unbinding cycles per 
hour were observed for an unbiased simulation (i.e., one 
without enhanced sampling) performed on a single CPU 
core. 

In order to obtain good statistics for the melting 
transitions, umbrella sampling^^ simulations were per- 
formed at around the melting temperature and the re- 
sults extrapolated to other temperatures using single- 
histogram re- weighting. The number of bases with 
negative hydrogen-bonding energy was taken as a dis- 
crete order parameter for the reaction, Q{x^) (with 
representing the coordinates of the system), and 
the simulations were performed using the biasing weight 
exp {f3W{Q)), with W{Q) chosen iteratively to make the 



partial partition functions 

ZMased ^ J ^^N (_;3(l/(x^) - W^(Q'(x^))) Sq,q, 

(3) 

approximately constant in Q. W{Q) is chosen to flatten 
free-energy barriers, encouraging the simulation to visit 
rarely sampled states, thereby increasing the frequency 
of barrier crossing and improving statistics. We extract 
the unbiased partition functions using: 

^unMased ^ ^Mased ^ [PW (Q)) . (4) 

Simulation efficiency precluded the need for multiple um- 
brella windows for the study of duplex formation, and the 
accuracy of single-histogram re- weighting was checked for 
15 bp duplexes, for which no systematic error over the 
range of extrapolation was found. Simulations of duplex 
formation were performed using two strands in a peri- 
odic box. Such simulations show strong finite-size effects 
due to the neglect of concentration fluctuations. These 
effects can be corrected for using the formalism of Ref. 
[751 allowing the extraction of bulk bonding probabilities. 



III. RESULTS 
A. Basic structure 

The model is specifically designed to allow an approx- 
imate representation of B-DNA in its double-stranded 
state. The relative sizes of the equilibrium backbone 
separation and ideal stacking distance lead to a pitch 
of 10.34 bp per turn at 296.15 K (23°C, approximately 
room t empe rature) similar to experimental estimates of 
10-10. 5.'2Ell Our model length scale is chosen so that 
the average rise per bp at room temperature is equal 
to 3.4 A, which results in a helix with a radius (taken 
as the furthest extent of the excluded volume) of 1 1.5 A, 
comparable to the experimental value of 11.5-12 

If strands are to form a double helix, it is not possi- 
ble to optimize the stacking interaction, as consecutive 
stacking sites cannot sit directly above one another. Sin- 
gle strands, however, are not constrained in this way and 
hence form tighter helices, with a radius approximately 
80% that of a duplex, similar to the 70-80% observed for 
a number of polynucleotide single helices.'^ A pleasing 
result is that, in order to alleviate the reduction in stack- 
ing, hydrogen-bonded bases undergo "propellor twisting" 
whereby bases in a pair twist in opposite directions in 
order to better align their stacking centres with adja- 
cent bases in the same strand. Experimentally, propellor 
twist is seen to vary from around 5° to 15° in GC rich 
regions and from 15° to 25° in sections with large AT 
content. EH In our case we observe an average propellor 
twist of 21.8° at 296.15 K, which is slightly larger than 
the average found for biological sequences. 
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B. Model thermodynamics 

1. Single- stranded stacking transition 

The attractive stacking interaction between adjacent 
bases causes single strands to form helical stacks at low 
temperature, with this order being disrupted as the tem- 
perature increasesP The literature is divided on both 
the nature of the attraction and the thermodynamics 
of the transition. The relative contributions of van der 
Waals, induction, hydrophobic and permanent multipo- 
lar electrostatic interactions remain unclear!^ There has 
also been much debate on the cooperativity with which 
bases stack. Vesnaver and Bresslauer claim that a 13- 
base strand undergoes a completely cooperative transi- 
tion between helical and random coil,'S2 whereas other 
authors have inferred essentially completely uncoopera- 
tive transitions for the individual stacks in poly(C) and 
poly(A).'S2Hlll Other groups claim weak to moderate co- 
operativity, with stacking probability affected by nearby 
base stacking.'^^Hsl is clear, however, that stacking has 
a large influence on the thermodynamics of double helix 
formation, as the magnitude of the enthalpy and entropy 
changes of hybridization increase as the single-stranded 
state becomes more disordered. 

Given the uncertainty in stacking behaviour it is difh- 
cult to constrain the model in this regard. For simplicity 
we compare the model to reported uncooperative stack- 
ing (We note that introducing a large degree of coopera- 
tivity would require adding internal degrees of freedom to 
the nucleotide or including next-nearest-neighbour inter- 
actions). The study of Holbrook et alibis most appro- 
priate, as it deals with heterogeneous strands rather than 
homopolymers, and hence might be expected to provide 
a reasonable estimate of the average stacking strength. 

To characterize the stacking properties of our model, 
we simulated oligonucleotides consisting of identical nu- 
cleotides (preventing the possibility of hydrogen bond- 
ing), and recorded the distribution of the number of 
neighbours with a s tacking interaction stronger than 
a minimum valu^^^ as a function of temperature and 
oligonucleotide length. For each strand length (5-9 and 
14 bases), two simulations (to check convergence) were 
performed at T = 333 K for 10^° MC simulation steps 
(a minimum of 7 x 10* steps per nucleotide), and we ex- 
trapolated the results to other temperatures using single- 
histogram reweighting. For a 14-base nucleotide, around 
50% of neighbours were found to be stacked at 338 K, 
with the transition being so broad that around 30% of 
neighbours remained stacked at 373 K, and 70% were 
stacked at around 306 K. 

The stacking was fitted to a simple statistical model 
(based on that of Poland and Scheraga for helix forma- 
tion in biopolymer^^ which is discussed in detai l in Ap- 
pendix[B| The model contains stacking enthalpieJ^^ and 
entropies A/i** and As**, such that the statistical weight 
(the contribution to the partition function) of an indi- 
vidual pair of stacked bases is exp(— A/i^Y-R^^ + ^s"*//?) 



relative to the statistical weight of the unstacked state.l^^^ 
In addition, the statistical weight is multiplied by a coop- 
erativity parameter a for each contiguous run of stacked 
bases, and an end-effect term w for each stack which in- 
volves a base at the end of the strand. If a and w are 
unity, each neighbour pair is independent. For < cr < 1, 
stacking is cooperative, and for a > 1 stacking is anti- 
cooperative. For < w < 1, end bases are less likely to 
stack, and for w > 1 the opposite is true. 

The four parameter model was fitted to data from 
strands of length 5 — 9 bases, over a temperature range 
of 320 - 352 K, giving: 

A/i''* = -5.55kcalmor^ 
As** = -16.0calmor^K-\ 
a = 0.766, 
w = 0.783. 

As cr and w are close to unity, our model shows only 
weak cooperative and end effects. The entropy and en- 
thalpyparameters are similar to those found by Holbrook 
et al.^who estimated Aft.** = —5.7 and — 5.3kcalmol~^ 
and As** = -16.0 and -15.0calmol-i K"! for two dif- 
ferent strands at [Na+] = 120 mM. Similar results at 
[Na+] = 50mM suggest weak salt dependence in this 
regime.^ 

Simulations performed in which the repulsive steric in- 
teractions were set to zero gave a slightly higher As** 
and values of a and w consistent with unity. Thus we 
conclude that the small cooperative effects in our model 
result from excluded volume. To understand the cause 
of the cooperativity, consider a chain of bases A, B, and 
C, and without loss of generality, consider B fixed whilst 
A and C move relative to it. Due to the requirement 
that base normals must point in the 3' to 5' direction to 
stack (see Appendix[A| , the regions of space in which A 
and C stack with B do not overlap. Therefore, if A and 
B are stacked, the excluded volume that A represents to 
C only prevents C adopting conformations in which it is 
unstacked. By contrast, if ^ and B are unstacked, the ex- 
cluded volume of A can prevent C adopting both stacked 
and unstacked configurations. As a consequence, C has a 
slightly higher tendency to stack if A and B are stacked, 
and so there is a positive cooperativity. Similarly, end 
bases experience more freedom due to the reduction in 
excluded volume, and are therefore less likely to stack. 

The statistical model is very successful. Fig. [3] com- 
pares its predictions to the results for a strand length 
(14 bases) and temperature (300 K) that are both well 
outside the ranges that were used in the fitting. Excel- 
lent agreement is found. 

2. Duplex formation 

Hydrogen bonding between bases can lead to the for- 
mation of bound pairs of DNA strands, which adopt the 
canonical 'B' double helix structure over a wide range 
of conditions due to stacking interactions. In contrast to 
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FIG. 3: Frequency of the total number of stacked bases in a 14- 
base single strand at 300 K from simulations of our model, and as 
predicted by the simpler statistical model with parameters as in 
Equation [5] 
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FIG. 4: (a) Trn as a function of strand length at an equal strand 
concentration of 3.36 X 10~* M, as given by our model (crosses 
connected by a solid line) and averaged parameters from Ref. 1421 
(squares connected by a dashed line), (b) Fraction of 10-base 
strands bound in duplexes at a concentration of 3.36 X 10~* M 
as a function of temperature, from our model (dashed line) and 
using the parameters of Ref. 1421 (solid line) . 



the stacking transition, there is a reasonable consensus in 
the experimental literature on the melting temperature 
{Tm) of duplexes. 

We fitted our model using the two-state model and 
parameters of Ref. 42, which is known to give a very 
good prediction of experimental Tm- Note that we 
do not reproduce two-state thermodynamics (see Ap- 
pendix[Cl) , but rather treat Ref. HH as a useful param- 
eterization of experimental results for the melting tem- 
peratures of short duplexes. As our model contains no 
differentiation between A-T and G-C base pairs, we com- 
pare our results to strands consisting of 'average bases', 
the parameters for which, A/i^*^^ = — 8.2375kcalmoP^ 
and Asg^^ = — 22.019calmon^ K^^, were obtained 
from averaging over all possible complementary base- 
pair steps in Ref. |42l We also use the average helix 
initiation terms A/i™£* = 1.1 kcalmol""^ and As^' = 
3.45 calmoP^ K^^, and an additional salt correction 
of Asf)5* = -0.12754 calmol-^K-i per phosphate for 
[Na+] = 500 mM, again taken from Ref. HI 

We simulated pairs of complementary oligonucleotides 
in a periodic box for a range of strand lengths between 
5 and 20 bases, and extrapolate d to bulk statistics using 
the method discussed in Ref. I79M^ Umbrella sampling, 
using the number of base pairs with a negative hydrogen- 
bonding energy as an order parameter Q, was used to 
ensure good sampling. 

For the purposes of comparison with Ref. |42l we de- 
fined a state to be bound if any hydrogen-bonding in- 
teraction between strands had an energy below a cutoff 
of — 0.60kcalmol~^, with typical hydrogen-bonding en- 
ergies of a single base pair being larger by a factor of 
approximately 7. Doubling the cutoff had no significant 
effect on our results. was taken as the temperature 
at which half of the strands would be bound in a bulk 
solution. 

The variation in melting temperature with duplex 



length is shown in Fig. |4](a), where it is compared to 
the predictions of the model of Ref. The agree- 

ment in the dependence of Tm on length is extremely 
good: this dependence is essentially a measure of the 
cooperativity of the duplex forming transition, which is 
most strongly influenced by the relative contributions of 
hydrogen-bonding and stacking/cross-stacking to duplex 
stability. 

The polynucleotide melting temperature (the melt- 
ing temperature for infinitely long strands) at 500 mM 
[Na"'"] for a strand of 50% C-G content, is predicted by 
the empirical relations given by Blake and DelcourlF^ 
and Frank-Kamenetskir^ as 365.8 K and 363.2 K, respec- 
tively. An approximate value for our model can be es- 
timated by simulating a pair of long, complementary 
strands in a partially bound state, and finding the tem- 
perature at which the free-energy change of adding an 
additional base pair to a partially formed duplex is zero. 
Simulations of partially formed 100-bp strands (with 
the duplex/single-stranded DNA interface at a variety 
of points) gave values of T in the range 364-366 K, in 
good agreement with the empirical relations. 

Fig. |4](b) compares the 10-bp duplex yield as a func- 
tion of temperature for our model with the predictions 
of Ref. |42l The widths of the transitions are consistent 
to within a few degrees Kelvin, with our model consis- 
tently producing a marginally sharper transition for all 
duplex lengths. The width of the transition determines 
the response of the system to changes in concentration. 
Consider, for example, a simple two-state model of DNA 
hybridization, as used in Ref. and expressed in Eqn. 
(fTl). Assuming equal total concentrations of each strand 
(plo]); the width of the transition scales approximately 
as: 

AT^MI, (6) 
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FIG. 5: Pree-energy profile of bonding of a 15 bp duplex, as a 
function of the number of base pairs, at 343 K. 

and the change in Tm with concentration is given by: 

dTrr, ^ kjjTl AT 

d[Ao] [Ao]AH^[Aoy ^> 

and hence agreement in both Tm and the transition width 
at a given concentration imply agreement in Tm over a 
range of concentrations. 

3. Free energy profile of duplex formation and fraying 

The free energy of duplex formation of a 15-bp duplex 
is plotted in Fig. [s] as a function of the number of base 
pairs (the order parameter for our umbrella sampling). 
To avoid complicating features in the free-energy profile 
due to hairpins and misbonds, which can conceal the un- 
derlying trends at low numbers of bonds, only base pairs 
that are present in the desired duplex had a non-zero 
strength of hydrogen bonding in this simulation. The 
general form of the free-energy profile is qualitatively sim- 
ilar to that found in Ref. |64] for another coarse-grained 
model of DNA, with an initial entropy penalty for the 
formation of the first base pair, followed by a downhill 
slope as the duplex 'zips up' in a cooperative fashion. 
As can be seen, the formation of the final base pair is 
actually free-energetically unfavourable, and the typical 
state consists of a duplex with 'frayed' ends. This fray- 
ing arises because bases at the end of the duplex lack the 
stabilizing influence of neighbouring base pairs on either 
side and entropy favours the open state. 

Although fraying is a widely accepted phenomenon,^"* 
experimental data is rather sparse, though it is estab- 
lished that weaker AT ends fray more easily than CG 
capped helices. Nonin et aZ.I^ inferred fraying probabil- 
ities of terminal AT bps of around 0.375 and 0.7 at 273 K 
and 298 K respectively, and found 0.015 and 0.12 for GC 
pairs at the same temperatures (at moderate salt concen- 
trations) , whereas Patel et alW^ found much higher melt- 
ing temperatures for terminal base AT pairs, conclud- 



ing that they were around 50% frayed at 313 K at high 
salt concentration. Our model shows approximately 10% 
fraying at 273 K, increasing to around 21% at 300 K and 
reaching 50% at approximately 330 K, reasonable values 
for 'average' base pairs. We note that in many cases, par- 
ticularly at low temperature, end bps in our model break 
but remain stacked, adopting conformations to maximize 
stacking at the expense of hydrogen bonding. 

4- Effect of stacking and fraying on thermodynamics of 
duplex formation 

We attempted to fit the duplex yield as a function of 
temperature, for each strand length /, using a two-state 
model of the form in Eqn. [l] 

^..|.e>,p(-/,(Ai,,-rA50). (8) 

where [Ai] is the concentration of strand A of length I and 
[Bi] and [AjS/] are the concentrations of its complemen- 
tary strand and the bound pair, v is the volume simu- 
lated, Zii and Zi are the statistical weights (contributions 
to the partition function) of duplexes and single strands 
of length / in our simulations and AHi and ASi the (as- 
sumed T-independent) enthalpy and entropy of transi- 
tion (we note that for our simulations in the canonical en- 
semble, AH corresponds to the change in internal energy 
of the system). It was found, however, to be an unsatis- 
fying fit to the melting curves, and further attempts to fit 
AHi and AS'; as a linear function in I (by analogy with 
the nearest-neighbour model), were unsuccessful. This 
failure should not come as a surprise, however, as several 
authors have indicated that the entropy and enthalpy of 
duplex formation show temperature d ependence due to 
the single-stranded stacking transition.E^I^^IllIlSI A more 
sophisticated model which explicitly treats the stacking 
and fraying is detailed in Appendix[C| We show that, for 
our model, temperature dependent effects can be incor- 
porated into an extended nearest-neighbour description 
of the transition. 

The actual temperature-dependent transition enthalpy 
can be deduced from: 

AH^-^lnK,„ (9) 

where K^q is the equilibrium constant of the reaction. 
The enthalpy changes at Tm for our model are slightly 
larger than predicted by Ref. |42i which is to be expected 
as the transitions are slightly narrower. The discrepancy 
rises from about 6% for 5-bp duplexes to around 22% for 
20-bp double strands. The behaviour of AS is similar. 

To investigate the details of the temperature depen- 
dence of enthalpy changes in duplex formation, we simu- 
lated 15 bp duplex formation over a wide range of temper- 
atures (for clarity, we again only give "correct" pairs an 
attractive hydrogen-bonding interaction), with the data 
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FIG. 6: Variation with T of enthalpies associated with the for- 
mation of a 15-bp duplex. Solid lines represent simulation re- 
sults, dashed lines the predictions of the statistical model outlined 
in Appendix[C] The lines labeled Ai^transition give the enthalpy 
change upon duplex formation for the simulations and the statis- 
tical model. The lines labeled //js and Hbb are the enthalpies of 
the duplex and single strands respectively, relative to a completely 
unstacked state. The transition enthalpy in the statistical model 
is the difference between the latter two curves. The vertical line 
denotes the melting temperature Tm = 342.6 /ST. 



shown in Fig. [6] We find that at low temperatures (up to 
342 K) AH becomes more negative with increasing tem- 
perature, with a gradient that reaches a maximum size of 
around — 0.055 kcalmol^^ per base pair at approxi- 
mately 328 K. At 342 K, however, AH reaches its most 
negative value, before increasing rapidly towards zero for 
higher temperatures. 

The statistical model of AppendixjC] allows us to an- 
alyze this behaviour in terms of the enthalpy changes 
within the bound and unbound states. As shown in Fig. 
|6j the enthalpy of the bound state is approximately con- 
stant at lower temperatures, whereas the enthalpy of the 
single strands becomes less negative with increased tem- 
perature as they unstack, causing the observed tendency 
for AH of the transition to become more negative with 
increasing temperatures. As temperature continues to in- 
crease, however, the typical bound state changes from be- 
ing a fully-formed duplex at low temperatures to a higher 
enthalpy partially-melted state at higher temperatures. 
Thus the enthalpy of the bound state becomes less nega- 
tive as fraying becomes more significant, resulting in the 
observed increase in AH. 

This change in enthalpy due to the stacking tran- 
sition has been o bserved experimentally by several 
groups, l^^'^^'^HUnEIl -^vJio deduced values for the typical 
enthalpy gradient of -0.050, -0.05 to -0.1, -0.062, 
-0.095 and -0.068 to -0.87kcalmol~i K"! per base 
pair, respectively, in reasonable agreement with our 
model. These investigations were generally performed 
wit h either o ligonucleotides with several CG pairs at the 
g-^ j75|82|88 | 90 | polynucleotides,^^ both of which would 
massively reduce the impact of fraying. If we set the 




FIG. 7; Typical configurations indicating relative flexibility of 
double-stranded, stacked single-stranded and unstacked single- 
stranded DNA. (a) 202 bp double helix at 296.15 K. (b) Stacked 
single strand of 202 bases at 277.15 K. (c) Unstacked single strand 
of 160 bases at 296.15 K. 



fraying contribution to zero, we obtain a typical value of 
—0.06 to — O.OTkcalmol"^ K~^, in even better agreement 
with experiment. 

In addition, Jelesarov et. aL— also considered a duplex 
with AT bps at the end of the helix, for which AH be- 
comes more negative with increasing T at low tempera- 
ture, before flattening-off by around 310 K, in agreement 
with the predictions of our model for the consequences 
of fraying. Measurements were not performed at high 
enough T to check for an eventual reversal of the gradi- 
ent of AH with temperature, but our model predicts the 
effect should be observable. In particular, duplexes with 
large AT end regions and stabilizing GC cores should 
demonstrate such an effect. 



C. Mechanical properties 

1. Single-stranded persistence length 

Single strands, particularly when unstacked, are ex- 
tremely flexible relative to dsDNA. This is crucial for 
nanotechnology, as it allows structures to contain highly 
bent ssDNA regions, such as at the vertices of polyhedra 
or the hinges of nanomachines. 

Poly(dT) (long single stands of DNA in which all the 
bases are thymine) is generally assumed to be entirely 
unstacked at room temperature, and has little tendency 
to form secondary structure.'^EIl As a consequence, it can 
be used to test the inherent flexibility of unstacked sin- 
gle strands. Gapped helices have been used by Mills et 
al.,^^ who inferred a high salt persistence length of 20- 
30 A from rotational decay rates, and Rivetti et al.^ 
who studied length distributions with atomic force mi- 
croscopy, finding ^ 16A for short sections (< 5 bases), 
growing to around 28 A for longer regions. Fluorescence 
resonance energy transfer between donors and acceptors 



11 




20 40 60 80 100 20 40 60 80 100 

Distance (bases) Length of measured section (bases) 



FIG. 8: Decay of the correlation (C) of helix axis plotted against 
base separation for a duplex at 296.15 K (squares) and a stacked 
single strand at 277.15 K (stars). The lines are fits to exponential 
decays. Also shown (solid line, no symbols) is the decay of the 
correlation of backbone vectors for an unstacked single strand at 
296.15 K. 



attached to either end of poly(dT) has also been used 
to fit polymer models to chain end-to-end distributions, 
with Murphy et al. finding a persistence length of around 
19.4 A at 500 mM [Na+].E21 Ah of these results suggest 
persistence lengths on the scale of 2-5 bases. 

To compare our model to experiment, we simulated 
single strands of one base type with stacking interactions 
set to zero, as shown in Fig. [7](c), to mimic poly (dT). 
Poly(dT) is sometimes modeled as a worm-like chainpEH 
in which a local stiffness opposes bending, resulting in an 
exponential decay of the correlations of backbone vectors 
with distance. In our model, however, unstacked ssDNA 
is essentially a freely-jointed chain with excluded volume, 
meaning that the conformation of backbone sites is re- 
stricted by steric clashes rather than local stiffness. As 
a result, the correlation of backbone-backbone vectors 
decays slower than exponentially (Fig. [8|, due to steric 
interactions between non-neighbouring nucleotides. Such 
a decay implies that adjacent bases demonstrate larger 
kinking than would be expected from the picture of a 
worm-like chain with an equivalent overall stiffness of the 
strand. Consecutive backbone orientation is restricted 
only by steric clashes, hence large kinks are possible. 
More distant bases, however, still feel the excluded vol- 
ume, and so the tendency is for directional correlation to 
decay slowly. 

It was therefore difficult to obtain an unambiguous 
value for the persistence length to compare to experi- 
ment. We used the general definition from Ref. IMl 



FIG. 9: Lps plotted against the length of the single-stranded region 
of DNA analyzed at 296.15 K. For the purposes of comparison, the 
separation of successive backbone sites is approximately 6.4 A. 



(L-lo) 



(10) 



with L being the end to end vector of the strand and Iq 
representing the first backbone-backbone vector. As the 
strand approaches infinite contour length, the value of 

We estimated 



by evaluating Eqn. 



10 



Lps should tend towards a constant, 



for single-stranded regions of 
lengths from 10 to 100 "Bases, embedded within strands 
of 70 to 160 bases. Four simulations were performed at 
296.15 K for each length for at least 2.5 x 10^ MC steps 
per particle, with the results plotted in fig.[9j The reason 
for embedding the measured length in a longer strand is 
that bases toward the end of single- or double-stranded 
DNA possess an increased relative flexibility. In order to 
obtain persistence length values that are valid for long 
strands where end effects are negligible bases near the 
end of strands were ignored. 

This increased flexibility at the ends, which results 
from fewer restraining interactions, may be manifested 
in experimental systems. It is possible that interpreta- 
tions that rely on the configuration of bases at the end of 
strands may be biased by such increased flexibility. Our 
results show that for strands of ^ 100 bases, the persis- 
tence length is similar to experimentally inferred values 
(19 — 30 A). Lps continues to grow noticeably for con- 
tour lengths much larger than (~ 20 A, or just over 
three bases), an effect that is consistent with the find- 
ings of Rivetti et al.^, and which indicates that non- 
nearest-neighbour interactions are important in provid- 
ing the effective stiffness. This agreement with exper- 
imental results, together with the fact that our model 
provides a reasonably good representation of the effec- 
tive excluded volume, suggests that our prediction that 
the freely-jointed chain gives a better representation of 
the conformational statistics of unstacked single-stranded 
DNA than the worm-like chain picture, should also hold 
for real DNA. 

Mills et al^ also investigated the fiexibility of gapped 
duplexes connected by poly(dA) at 4°C, when the bases 
are largely stacked into single helices. Although the in- 
terpretation depends on the probability of stacking, the 
intrinsic persistence length of the stacked regions was es- 
timated to be in the region of 100 A, corresponding to 
approximately 30 bases (stacked regions have a shorter 
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length per segment than unstacked sections due to twist- 
ing). This value is noticeably larger than that for un- 
stacked strands, but smaller than for duplexes (approx- 
imately 150 bases at high salt concentration). For com- 
parison we simulated single strands of 202 identical bases 
at 4°C for 8 x 10^ MC steps (ignoring the data from the 
five bases at either end), requiring that all bases main- 
tained a stacking interaction of > — 0.60kcalmol~^ with 
their neighbours (doubling this value had no discernible 
effect). Unlike in the unstacked case, excluded volume 
does not play a large role as the length scale over which 
bending occurs is much larger than the size of one base 
(as can be seen in Fig[7](b)). Hence, the relative align- 
ment of vectors between stacking sites (which now act as 
the basic steps along the strand) was observed to decay 
exponentially, allowing a fit of the form: 

(l„.lo) =exp(-n(/o)/L;*r'), (11) 

from which we concluded that Llf^ / {Iq) = 41.5 bases 
(see Fig. [8| for our model. This value is higher than that 
reported by Mills et a/.^^by approximately 50%, but im- 
portantly it is much greater than the persistence length 
of unstacked ssDNA whilst also being much more flexi- 
ble than dsDNA. Furthermore, the estimates in Ref. |69l 
assume unstacked bases behave as regions of persistence 
length 30 A, which is at the upper end of estimates for 
poly(dT). As already noted, our results suggest that lo- 
cal kinking can be much larger than would be implied by 
the persistence length of unstacked bases. As such, the 
flexibility contribution from a single unstacked base may 
be larger than estimated, and consequently the flexibil- 
ity of the stacked regions may be overestimated, possibly 
bringing our model into better agreement with the data. 



2. Double- stranded persistence length 

The persistence length of dsDNA is generally accepted 
to be approximately 450-500 nm at moderate to high 
[Na"*"], corresponding to around 130-150 base pairs.SEM 
We performed three simulations of a duplex of length 
202 bp at 296.15 K for 1.5 x 10^ MC steps, ignoring the 
data from the ten base pairs at either end. Similar to 
our findings for stacked single helices, the correlation of 
the helix axis (defined as the distance between consecu- 
tive base-pair midpoints) at two points was observed to 
decay exponentially wit h d istance, allowing an estimate 
oiLdupiex through Eqn.jll] Fig. [s] indicates a model per- 
sistence length of arouna 125 base pairs, in reasonable 
agreement with experiment. A typical configuration is 
shown in Fig. [7](a). 

3. Double- stranded torsional and extensional stiffness 



torque G to resultant twist A9 of a duplex of length 
I: C = GI/A9. Estimates for C have been made us- 
ing cycliza tion kine tics and topoisomer distribu tion s for 
minicircles,'2E2iIinil luminescence depolarizatiorP^ and 
from twisting of DNA under tension, giving values 
in the range 170-440 f J fm. The effect of salt concen- 
tration o n C is not entirely clear from the experimental 
literature.liS^! 

Calculating the response to torsion is non-trivial, as 
the curvature of the DNA axis makes the twist between 
two ends hard to define. In our previous work,Il3 we at- 
tempted to infer an elastic modulus from the fluctuations 
in the angle between successive bases when projected 
onto the plane perpendicular to the vector joining their 
midpoints. Unfortunately, this method overestimates the 
torsional flexibility, presumably failing to decouple tor- 
sional variation from other fluctuations in a base-pair 
step. In this work, we instead obtain an approximate es- 
timate of the torsional modulus by considering the twist- 
ing of the central 10 base pairs of a 20 bp duplex, and the 
central 20 base pairs of a 30 bp duplex at 296.15 K. Such 
short sections are extremely stiff, minimizing the natural 
bending fluctuations. To provide an unambiguous deflni- 
tion of torsion and twist, MC moves were chosen so that 
the base pairs at the end of the central section remained 
perpendicular to the vector between their midpoints, al- 
lowing the vector between the midpoints to define an axis 
about which torsion could be applied and twist measured. 

Simulations were performed in which the torque ap- 
plied to the end bases was varied between ±8pNnm, and 
the resultant twist used to infer C. A separate estimate 
was also obtained using the equipartition result for the 
variance in twist at zero torque: {A6^^-^^^) = kTl/C. Fur- 
ther simulations used the equipartition result to estimate 
C under a tension of 9 pN, to ensure that stretching the 
duplexes had no effect. All estimates (for both 10- and 
20-bp regions of interest) gave C ~ 455 — 495 f J fm, sug- 
gesting that this is a reasonably robust estimate of the 
torsional stiffness of DNA duplexes in our model. 

A long molecule of dsDNA under low tension responds 
as an extensible worm-like chain, with the behaviour ini- 
tially dominated by the straightening of the chain, before 
stretching the base-pair rise itself becomes rele vant as 
the chain extension approaches the contour length. l^iSIIinSI 
At higher forces, the duplex undergoes an overstretch- 
ing transition and the B-DNA structure breaks down.^^^ 
Experimental estimates for the extensional modulus K, 
obtained from fitting force-extension curves to extensible 
worm-like chain mo dels, gi ve K in the region of 1050- 
1250 pN at high salt! i°5 | i06 | 

The extensional modulus K was estimated by apply- 
ing tension to a 100-bp region within a 110-bp double 
helix, and fitting the resultant force-extension curve to 
the result of Odij 14^221 for extensible worm-like chains: 



Torsional rigidity (in the linear regime) is quanti- x — L fl + [1 + coth ]\ (12) 

fied by an elastic modulus C, which relates applied ^ "I K 2F ^ / ' 
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where 



y 



LpskT 



1/2 



(13) 



in which x is the extension resulting from a force F 
apphed to a duplex of contour length Lq and persis- 
tence length Lps- Performing an unconstrained three- 
parameter fit with the values of Lq, Lps and K gave 
an excellent agreement with the data, as shown in Fig. 
lO) with K = 2120 pN, Lq = 339.4 A and Lp^ = 438 A 
{129 bp). The value of Lq is similar to that expected 
from the rise of a short duplex (exactly 3.4 A per base 
pair would give Lq — 336.6 A), and Lp^ is only slightly 
larger than the estimate from the decay of the correlation 
of the helix axis (415 A). This agreement suggests that 
the extensible worm-like chain model provides a good 
description of the model's properties in this regime, and 
that the value oi K — 2120 pN is a reasonably robust one 
for our model. 

Our model gives C ~ 475 f J fm (slightly larger than 
the top of the experimental range of 170-440 f J fm) and 
K w 2120 pN, (about twice as large as typical experi- 
mental estimates). We do not believe the differences are 
crucial to the processes we are interested in investigating 
(although certain quantities, such as the critical twist 
density at which plectonomes are extruded, will be af- 
fected). It was found to be difhcult to reparameterize 
the model to reduce these moduli without decreasing the 
persistence length, which is already slightly below exper- 
imental estimates. We feel that the current compromise, 
in which the persistence length is most faithfully repro- 
duced, is a reasonable one as it is easier to imagine that 
nanostructures and nanodevices would be more sensitive 
to bending than torsional or extensional stiffness. 

It is worth noting that recent investigatio ns have 
suggested that DNA overwinds when stretched.^i^si Our 
model does not reproduce this anti-intuitive behaviour, 
instead slightly untwisting as the stacking distance is ex- 
tended. It is possible, therefore, that the model fails to 
capture the softness of a mode of deformation - pe rhaps 
the sloping of base pairs with respect to the axi^^^^ -that 
leads to this behaviour. If this is the case, it is perhaps 
unsurprising that the estimated moduli are larger than 
experimental observations 



D. Structural motifs 

1 . Hairpins 

DNA hairpins, which occur when a self-complementary 
strand binds to itself and forms a duplex stem and 
an unhybridized loop (Fig. 11), are a common struc- 
tural motif. They have biological importance as a 
mechanism for release of superhelicity through cruci- 
form formation. ■^^ Their relevance to nanotechnology in- 
cludes metastable states (either occurring by accideniP or 
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FIG. 10: Tension applied against extension for the central 100 bp 
of a 110-bp duplex at 296.15 K. The squares are simulation results, 
the solid line is a fit using Eqn. |12[ 




FIG. 11: 
343 K. 



A hairpin with a 12 bp stem and an 18-base loop at 



through design) .ISHlll In addition, they are an extremely 
common motif in biological RNA structures.^'^ Aside 
from our earlier work using a previous parameterization 
of the current model,'^' we are unaware of any simulta- 
neous application of a coarse-grained model to the for- 
mation of both hairpins and bimolecular duplexes. Our 
approach, in which the single strands have the potential 
to be extremely flexible, allows for hairpins and duplexes 
to have appropriate relative stabilities. 

To demonstrate the ability of our model to represent 
hairpins, we simulated systems with stem sizes ranging 
from 6-12 bps, and loops of 6-18 bases. Four simula- 
tions for each hairpin were performed in the vicinity of 
Tm for 4 X 10^° MC steps (corresponding to at least 10^ 
steps per nucleotide). Umbrella sampling as a function 
of hydrogen-bonded base pairs was used to ensure good 
statistics. In this case, we considered only states with at 
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and As^^l"" = -7.33calmor^K 



FIG. 12: Variation of hairpin melting temperature with (a) loop 
length and (b) stem length from our model (symbols connected by 
dashed lines) and from Ref. 1421 



least one of the 'native' bps in the stem present as being 
a hairpin, as long loops have the potential to form tran- 
sient base pairs with little relevance to the stability of 
the target structure. SantaLucia has presented parame- 
ters for estimating the melting temperature of hairpins,^^ 
which we again take as a good representation of exper- 
imental results. These parameters include sequence in- 
dependent entropy penalties for loop formation and en- 
thalpy/entropy terms for the stabilizing effect of the first 
mismatched bp in the loop (called a 'terminal mismatch': 

= -2.91kcalmor^ 
^). Our results for 
Tjn are compared to the predictions of Ref. [42] in Figs. 
12 (a) and (b). is defined as the temperature at which 
a strand is in a hairpin state half of the time. 

The results indicate that our model slightly underesti- 
mates T„j for hairpins relative to the predictions of Ref. 
1421 (and by extension, experiment) by approximately 3K, 
which is slightly less than 1% of the absolute melting 
temperature (at the Tm predicted by Ref. |42l our hair- 
pins constitute approximately 25% of the ensemble rather 
than 50%). Encouragingly, the trends with loop length 
and stem size are well reflected by our model (this is 
particularly pleasing, as the dependence on loop length 
was not used in parameterization) , an indication that the 
majority of the physics of hairpin formation is well rep- 
resented by our model. We note that our model is less 
successful for the smallest loops (3-5 bases), possibly be- 
cause it does not incorporate specific interactions within 
a tightly packed loop that may provide extra stability.'^J^ 
As found with duplex formation, transition widths for 
our model are slightly smaller than predicted by Ref. H51 
(the difference is very similar to that observed in Fig. 
ib)). 



2. Mismatches, bulges and internal bubbles 

A variety of other DNA motifs exist, such as duplexes 
involving mismatches between non-complementary base 
pairs or with one strand carrying extra, unpaired bases. 
SantaLucial^ has provided parameters for the influence 



of these motifs on . In many cases, they are highly se- 
quence dependent and it is less clear than in the simple 
double helix case (where the variations in parameters are 
relatively smaller) that averaging over AS" and AH con- 
tributions for all sequences is a reasonable approach to 
find an average effect. It should, however, give a rough 
estimate of the typical change in melting temperature 
due to a motif. 

We compared the effect of several motifs on model du- 
plex T„j to the predictions of Ref. SH again averaged 
over all possible sequences (Table |l|. The simplest pos- 
sible case is that of a single unpaired base at the end of 
a strand, generally referred to as a 'dangling end'. Typi- 
cally, dangling ends are observed to provide a stabilizing 
influence, assumed to result from cross-stacking with the 
final base pair of the duplex, although t he de gree of sta- 
bilization is highly sequence dependent.'^^Ell xhe cross- 
stacking interaction included in our model provides such 
a stabilizing effect, and the degree of stabilization is in 
good agreement with the predictions of Ref. |42l 

In contrast to dangling ends, extra, unpaired bases on 
one strand within the helix are highly destabilizing, as 
they disrupt the helix structure. In the terminology of 
SantaLucia, these are known as bulges. In general, our 
model slightly underestimates the destabilization of he- 
lices due to bulges compared to the predictions of Ref. 
|42| although the observed melting temperatures remain 
within 2% of the predictions. 

If a non-complementary pair of bases is added to an 
otherwise complementary duplex to form a mismatch, 
the effect is generally stabilizing at the end of a duplex 
(this is a "terminal mismatch" ) and destabilizing in the 
interior. Our model reproduces this tendency as shown in 
Table|l| and also captures the increase in destabilization 
if the mismatch region is extended (to form an internal 
"bubble"). Once again, the destabilizing effect of motifs 
internal to the duplex tend to be slightly underestimated 
relative to the predictions of Ref. 42, and the observed 
melting temperatures again remain within around 2% of 
the predictions. 

The motifs provide a good test of the model, as many 
were not considered in parameterization (although the 
dangling ends and terminal mismatches were used to con- 
strain the strength of cross-stacking). In addition, mis- 
bonded structures involving these motifs may have a role 
in the kinetics of nanostructure assembly, and hence it is 
important that the model provides a reasonable represen- 
tation of them. Although in some cases the quantitative 
agreement with Ref. 21] is not perfect, the model rep- 
resents these motifs in a physically sensible way and the 
trends in stability at least qualitatively reflect the average 
properties of DNA. Furthermore, the typical magnitudes 
of ATjn are reasonable, with the remaining within 2% 
of the average predictions of Ref. [H] It is possible that 
an underestimate of the disruptive effect of extra bases 
on the helical structure,'^ perhaps because the excluded 
volume of bases is smaller than in reality, causes the un- 
derestimate of ATm due to internal motifs. This effect, 
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Motif 


Complementary bp 


Motif size 


Ar,„ / K 








Our Model Ref. |42] 


Dangling end 


5 


1 base 


+3.95 +4.24 




8 


1 base 


+ 1.20 +1.44 




15 


1 base 


+0.74 +0.61 



Bulge 8 1 base -18.58 -23.40 

2 bases -24.64 -27.23 

15 1 base -8.86 -12.58 

2 bases -11.51 -11.67 

5 bases -16.91 -13.78 



Terminal mismatch 


5 


1 base / strand 


+6.85 


+6.95 




8 


1 base / strand 


+2.73 


+2.55 




15 


1 base / strand 


+0.74 


+0.63 


Internal mismatch 


8 


1 base / strand 


-8.77 


-14.09 


/ bubble 




2 bases / strand 


-15.77 


-21.86 






5 bases / strand 


-25.83 


-28.81 




15 


1 base / strand 


-5.35 


-4.97 






2 bases / strand 


-9.53 


-11.60 






5 bases / strand 


-15.62 


-15.74 



TABLE I; Effect on the molting temperature of a complementary duplex due to the addition of a motif. In tliis table, ATm is the difference 
between the Tm of a structure with the motif and a fully complementary duplex consisting of the same number of complementary bps as 
the motif structure. For internal mismatches, bulges and bubbles, the motif was placed at the centre of the duplex. 



however, would be expected to be larger for bulges than 
for mismatched pairs or symmetric bubbles. 

Given the good agreement between the model and Ref. 
2^ for a single mismatch added to a 15-bp duplex, we in- 
vestigated how the position of the mismatch affected sta- 
bility. T,„ is plotted against the position of the mismatch 
in Fig. [13] As can be seen, there are two distinct regimes, 
with the melting temperature initially decreasing as the 
mismatch is moved from the end of the strand (where it 
is stabilizing) towards the centre. Eventually, however, 
it reaches a plateau at around five bases from the end of 
the strand. 

The cause of this plateau can be identified from ex- 
amining the free-energy profiles for duplexes with mis- 
matches located two and six bp from the end (Fig. 14 1. 



The first point to note is that the stability of duplexes 
with the maximum number of base pairs (15) is nearly 
identical, despite the difference in mismatch position. 
This suggests that provided a mismatch is surrounded 
by base pairs on either side, changing its location has lit- 
tle effect on the total free energy. The difference in Tm 
arises instead from a difference in the lowest free-energy 
state. 

When the mismatch is near to the strand end (in the 
regime where Tm depends on mismatch position), the 
most stable state consists of the larger section of duplex 
formed with the bases beyond the mismatch unpaired. 
In this regime, the total free-energy gain from pairing 
the bases beyond the mismatch does not compensate for 



the free-energy cost of enclosing a mismatch in a helix. 
As the mismatch is moved towards the centre, the larger 
section loses bases and so becomes less stable, with the 
consequence that Tm drops. At some point, however, 
it becomes favorable for the bases in the shorter region 
to also bond. From this point onwards, the most stable 
state consists of the two duplex regions surrounding the 
mismatch. The net effect of moving the mismatch further 
towards the centre only marginally affects the overall sta- 
bility of the duplex. As a result a plateau in Tm should 
occur. 

As the temperature is lowered, the free-energy gain 
from base pair formation increases. As a consequence, 
the number of bases required before the region beyond 
the mismatch is stable as a duplex decreases. For exam- 
ple, we find that for a mismatch two bases from the end 
of a 15 bp duplex, the enclosed mismatch state becomes 
the most stable just below 320 K. 

It is claimed in Ref. 42 that the stability of a mis- 
match is independent of its position, except for termi- 
nal mismatches and mismatches occurring one base from 
the end, which may cause the final base pair to be un- 
stable. Our simulations suggest, however, that the dis- 
tance of the mismatch from the duplex end at which T„ 
plateaus should increase with strand length (as longer 
strands melt at higher temperature). Furthermore, a 
similar temperature-dependent influence of motif loca- 
tion should hold for all destabilizing internal bubbles and 
bulges, as the beginning of the plateau simply indicates 
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FIG. 13: Melting temperature of 15-bp complementary helix with 
an additional mismatch added against the distance of that mis- 
match from the end of the strand. The melting temperature in the 
absence of a mismatch is indicated via the horizontal line. 
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Number of base pairs 

FIG. 14: Free energy profile at 339 K for a 15 base pair duplex with 
one additional mismatch placed 2 bases from the end (squares) and 
6 bases from the end (crosses). 



the point at which it is free-energeticaUy favourable to en- 
close the disruption. This rcsuh should be qualitatively 
robust to the approximations in the model. In partic- 
ular, sequence dependence will likely cause fluctuations 
but not destroy the general trend. 



IV. DISCUSSION 

We have examined in detail the structural, mechanical 
and thermodynamic properties of a coarse-grained model 
of DNA based on that presented in Ref. !'l (and used there 
to simulate a full cycle of DNA tweezers, an iconic nan- 
odevice). Several small alterations to the model were 
made in order to improve the description of DNA flexi- 
bility and allow for the calculation of forces and torques. 
The aim of the model is to embed the known thermo- 
dynamics of B-DNA into a dynamical, coarse-grained 




FIG. 15: Typical configurations of a duplex with 15 complemen- 
tary bp and 1 internal mismatch at 335 K. a) Mismatch two bp from 
the end of the strands, with unpaired bases after the mismatch, b) 
Mismatch six bp from the end of the strand, enclosed by two intact 
helices. 

representation of DNA while simultaneously providing 
a reasonably accurate description of the structural and 
mechanical properties of B-DNA and ssDNA. 

The model provides a good quantitative representation 
of the three key thermodynamic processes that affect self- 
assembly: single-stranded stacking, duplex hybridization 
and hairpin formation. To our knowledge, this is the first 
coarse-grained model for which all three processes have 
been considered simultaneously. 

The mechanical properties of DNA are also reasonably 
well represented by the model, with the singled-stranded 
persistence length (for stacked and unstacked bases) and 
double-stranded persistence length, stretch modulus and 
torsional modulus all of similar size to typical experimen- 
tal estimates. Importantly, the inclusion of the stacking 
transition allows single strands to be unstacked and flex- 
ible, which facilitates the formation of hairpins as well 
as other DNA nanostructures for which single-stranded 
regions are important. 

The model contains several simplifications, the most 
important of which are the lack of sequence dependence 
beyond the specificity of A-T and G-C bonds and the 
absence of explicit electrostatic interactions. Thus the 
model cannot predict screening effects without a new pa- 
rameterization at each salt concentration. Furthermore, 
in its current state, the model may incorrectly repre- 
sent structures that involve the close proximity of strands 
that are not bound to each other, where it would fail to 
capture the cumulative repulsion resulting from adjacent 
phosphate sites. The model is also only capable of rep- 
resenting structures involving B-DNA and ssDNA, and 
the equal groove size in the model may also mask subtle 
effects related to major and minor grooving. 

Ignoring sequence heterogeneity dramatically lowers 
the number of parameters needed for the coarse-grained 
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model. It also simplifies the analysis of the physical 
processes, naturally generating results for an "average 
strand" . This picture may be particularly advanta- 
geous when sequence effects obscure an important gen- 
eral trend. Of course there are also many processes where 
sequence heterogeneity is critical, for example, preferred 
sites for bubble nucleation. Such effects are not resolved 
by our model. Nevertheless, for many applications in 
DNA nanotechnology, sequence dependent effects beyond 
complementarity are not that critical to design or func- 
tionality. For example, in Ref. fT] we show that the en- 
tropy cost of bringing an anti-fuel strand in close proxim- 
ity to the tweezer complex slows down the displacement- 
mediated detachment of the first arm of the fuel strand. 
Such predictions should be fairly robust and indepen- 
dent of sequence heterogeneity effects. We also show how 
metastable hairpin formation in the anti-fuel strand can 
further affect the free-energy profile and the related ki- 
netics of the displacement process. Again, this general 
prediction should be fairly robust, but how it plays out 
for a particular set of tweezers will depend on how easily 
the anti-fuel strand sequence forms hairpins. For exam- 
ple, if the metastable hairpin formation is undesirable, 
then our predictions could be supplemented by methods 
such as the nearest-neighbour model in order to design 
strands that minimize hairpin formation. 

Similarly, in the current paper we make a series of pre- 
dictions that should be relevant to experiment. For ex- 
ample, we predict that a maximum in the magnitude of 
the enthalpy change of duplex formation, Ai/, should 
occur as the temperature nears the polynucleotide melt- 
ing temperature and fraying begins to reduce the num- 
ber of base pairs in the bound state. The exact location 
and magnitude of the maximum will depend on sequence- 
dependent effects such as the exact melting temperature, 
whether the end bases form weaker AT or stronger CG 
bonds that promote or repress fraying, respectively, as 
well as the thermodynamics of single-stranded stacking. 
But our prediction of a maximum in the absolute value 
of AH should be fairly robust. 

We also predict that the of a duplex containing a 
destabilizing motif should depend on the location of the 
motif in a temperature-dependent fashion. As the desta- 
bilizing motif is moved towards the centre of a duplex, 
the melting temperature should decrease before reach- 
ing a plateau. The distance from the end at which the 
plateau is observed will increase with Tm and the desta- 
bilizing effect of the motif. Both of these effects result 



from sufficiently generic properties that we expect them 
to be resilient to the approximations of the model. 

When compared to the nearest-neighbor model, our 
model tends to slightly underpredict the effect of dan- 
gling ends, bulges, terminal mismatches and internal mis- 
matches on the duplex melting temperature. Again, it 
should be kept in mind that for real DNA the effect of 
each of these motifs will depend very much on the ex- 
act sequence, whereas our predictions are for an average 
over all possible sequence permutations. Nevertheless, in 
a system where multiple kinetic traps are relevant, extra 
care should be taken when interpreting the simulations 
because the relative stabilities of different states could be 
somewhat misrepresented. 

We also make some predictions for the conformational 
statistics of dsDNA, suggesting that single strands be- 
have much more like freely-jointed chains with excluded 
volume than like worm-like chains. A particular conse- 
quence of this difference is that freely-jointed chains typ- 
ically undergo much larger local kinking than worm-like 
chains with an equivalent effective persistence length. 

Finally, we have demonstrated that a nearest- 
neighbour two-state model of duplex formation can be ex- 
tended to incorporate stacking and fraying. This exten- 
sion suggests a way to reconcile the appealing simplicity 
of nearest-neighbour models with temperature variation 
of both single- and double-stranded states. To develop 
such a model, however, would require a much greater con- 
sensus in the properties of single-stranded stacking and 
fraying than currently exists. 

As it stands, we believe the model has the poten- 
tial (both in terms of accuracy and computational effi- 
ciency) to open up a range of previously inaccessible gen- 
eral problems involving the interplay between single- and 
double-stranded DNA, including many aspects of DNA 
nanotechnology. For example, we are investigating the 
operatio n of a DNA walker, the force-induced melting 
of DNA,lii^ the assembly of a DNA tetrahedron^^ and 
binding of hairpins in the presence of a DNA catalyst.!^ 
The model may also be applied to biologically relevant 
processes such as the extrusion of cruciforms in super- 
coiled DNA containing inverted repeats.^^ Future work 
will aim to incorporate sequence-dependent interaction 
strengths (we note that much of the sequence dependence 
should arise from stacking), major and minor grooving 
and an implicit model for electrostatics, as well as com- 
paring to atomistic simulations to improve the descrip- 
tion of fluctuations on the base pair level. 
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small: we therefore use the term "enthalpy" to describe 
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enthalpy and entropy changes per pair of interacting 
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Appendix A: Model details and parameterization 

The current model is based on that introduced in Ref. 
m with some changes introduced to give duplexes more 
flexibility (having performed a wider range of structural 
tests, the stiffness was found to be overestimated in 
the old version). Truncated interactions have also been 
quadratically smoothed (making the potential continuous 
and differentiable, allowing simulation with methods like 
Langevin dynamics). Although this introduces further 
parameters, the thermodynamic and structural proper- 
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ties are largely unaffected by the details of smoothing. 

The functional forms used in the interactions are given 
below: 



FENE spring (used to connect backbones): 



(Al) 



• Morse potential (used for stacking and H-bonding) : 
VMorsc(?', e, ro, a) = e(l - cxp (-(r - ro)a))^. (A2) 

• Harmonic potential (used for cross-stacking): 

^harm(?', £, ro) ^ ^ {r ~ Tq) ■ (A3) 



• Lennard - Jones potential (used for soft repulsion) ; 



• Quadratic smoothing terms: 

"l4mooth(2;, b, Xc) = b{Xc - 



(A4) 



• Quadratic terms (used for modulation) 

v^od{o,a,eo) = i-a{e-eof (as) 



(A6) 



These functional forms are combined to give the fol- 
lowing smooth and diffcrcntiable functions: 



The radial part of the stacking and hydrogen-bonding potentials: 



h{r) 



VMorso(r, 6,7-0,0) - VMorso(rc,e,ro,a)) if r'""" 
low ^low 



< r < r 



high 



e1/smooth(r,5"f\r,^^9'') 




otherwise. 



• The radial part of the cross-stacking potential: 

Vha.rm{r, £, To) - T4arm(rc, £, ^o) if r'"™ 

/2(r) 



eVsmootbir, b'°-,r^'"") 




.low 



it r^"'" < r < r" 

j,high ^ J. ^ j,high 

otherwise. 



• The radial part of the excluded volume potential: 

/3(r) = < 



VLj(r, e, cr) if r < r*, 

eV;inooth(r, b, Tc) if r* < r < rc, 
otherwise. 



• The angular modulation factor used in stacking, hydrogen bonding and cross-stacking: 



h{e) 



v^od{o,a,eo) ifOa- AO* <e <eo + Ae*, 

V;mooth(0, b, Oo - A9c) if 9o-Ae,<9<9o- AO*, 
14mooth(e, b, 9o + A9c) if 9^ + A9* <e<eo + A9„ 







otherwise. 



(A7) 



(A8) 



(A9) 



(AlO) 



Another modulating term which is used to impose right handedness (effectively a one-sided modulation): 



/5(0) 



Vmod{x,a,0) 



if X > 0, 

if X* < X < 0, 



otherwise. 



(All) 



The potentials and parameters used to describe each interaction are hsted in the TablcUB When more than 



21 



Interaction Functional form Parameters 



backbone spring 


Vfcnc ('^backbone ) 


k^2 


A = 0.25 


r-Q = 


0.7525 






^backbone 
















hydrogen bond 


/l(»"bond) 


e = 1.077 


a = 8 


ro 


= 0.4 




= 0.34 


Vhb 








rc = 


= 0.75 


^high 


= 0.70 






a = 1.50 


fo = 


Ae* 


= 0.70 








U{e2) 


a = 1.50 


fo = 


AO* 


= 0.70 










a = 1.50 


00 = 


AO* 


= 0.70 








MO4) 


a = 0.46 


6*0 = TT 


Ae* 


= 0.70 








MO7) 


a = 4.00 


^0 = 7r/2 


AO* 


= 0.45 










a = 4.00 


Oo = 7r/2 


AO* 


= 0.45 






stacking 


fl (^stack) 


e = 1.2145 


a = 6 


ro 


= 0.4 


J.I0W 


= 0.32 


^stack 




+2.6568 kT 




rc 


= 0.9 


^high 


= 0.75 






a = 1.30 


fo = 


AO* 


= 0.8 










a = 0.90 


6*0 = 


Ae* 


= 0.95 










a = 0.90 


00 = 


AO* 


= 0.95 








/5(cOs((^l)) 


a = 2.00 


X* = -0.65 












/5(C0S(<^2)) 


a = 2.00 


X* = -0.65 










cross-stacking 


/2(rcstack) 


e = 47.5 


ro = 0.575 


rc = 


0.675 


j-io™ - 


= 0.495 


^^c_stack 












^high 


= 0.655 




/4(ei) 


a = 2.25 


60 = 2.35 


Ae* 


= 0.58 










a = 1.70 


eo = 1.00 


Ae* 


= 0.68 








MO3) 


a = 1.70 


eo = 1.00 


Ae* 


= 0.68 








fiiOi) + f4{7T - 64) 


a = 1.50 


fo = 


Ae* 


= 0.65 








h{0^) + U{-K - Or) 


a = 1.70 


00 = 0.875 


Ae* 


= 0.68 








f4{es) + f4{n - 6s) 


a = 1.70 


00 = 0.875 


Ae* 


= 0.68 






excluded volume 


/3(rexi) + fair 0^2) 


e = 2.00 


cri = 0.70 


r*i^ 


0.675 








+/3(j'ex3) + /3(rcx4) 




(72 = 0.33 


rl = 


= 0.32 












erg = 0.515 


^■3 = 


= 0.50 












a4 = 0.515 


rl = 


= 0.50 







TABLE II: Parameter values in the modeL All lengths are defined with respect to a reduced lengthscale (1 unit = 8.518A), 
all angles are given in radians and all energies are defined with respect to a reduced temperature (kT = 0.1 corresponding to 



not include an /4(roxi) term. Note the directional de- 
pendence in the stacking interaction: the angles are de- 
fined between normal vectors of bases and a vector join- 
ing bases in the 3' to 5' direction. Only complementary 
base pairs posses non-zero hydrogen-bond energies. 

To ensure right-handed helices, the modulation of 
stacking interactions is somewhat subtle, involving chiral 
terms. Consider two consecutive bases in a strand, i and 
J, with i ^ j corresponding to the 3' — > 5' direction. 
The angles 6*5 and are defined as the angles between 
the normals of i and j and r'-^^g^^j^ (with r'^^^^^. being de- 
fined as the vector from the stacking site of i to that 
of j). Thus, stacked bases have normals pointing in the 
3' — >■ 5' direction, allowing the definition of a local axis. 
The angles and (/)2, defined in terms of this axis, pro- 
vide helicity. For each base we define a normalized vector 
■^hoiicity = f''' ^ f5^j^(,ij_|3jjsg, where a = i,j and fback-baso 



300 K). The variables in the potential are defined in Fig. 16 



one function is listed for an interaction, the total interac- 
tion is a product of all the terms. Given the parameters 
of the main part of the interaction (for example, e, tq, a 
and Tc for the VMorso part of fi{r)), the parameters of the 
smoothed cutoff regions are uniquely determined by en- 
suring continuity and differentiability at the boundaries 
f^^iow g^j^j ^high £qj. /i(r)). The nucleotide geometry and 
definition of the angles and vectors used in the potential 
are shown in Fig. |I6[ 

The potential of the system is given by: 



V — ^ {Vbackbone + Vgtack + Kxc) 
nil 

+ ^ {VhB + Vc_stack + Vexc) , 
other pairs 



(A12) 



where the sum over nn runs over consecutive bases within 
strands, and VJ^.^ is equal to Vexc except that it does 
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FIG. 16: Illustration of variables used in the potential of the DNA model. 



is the normalized backbone site to stack site vector of 
base a. 4>i and (j)2 are the angles between v" and the 
base normals: for a right handed helix, these are < 7r/2 
and the stacking interaction is modulated to disfavour 
greater angles. 



Appendix B: Statistical model of stacking 



It is instructive to characterize the thermodynamics of 
the model using a simpler, statistical model, as it high- 
lights the causes of certain behaviour. We model the 
stacking transition using a statistical description based 
on that of Poland and Scheraga.!^ In this model, a given 
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pair of neighbours can be either stacked or unstacked, 
and the Ust of stacked pairs specifies the system configu- 
ration. 

If each stacking pair were independent, the contribu- 
tion to the partition function from a configuration (its 
relative probabihty of occurring) would be given by: 



(Bl) 



where u and v represent the contributions to the parti- 
tion function ("statistical weight") of a stacked and an 
unstacked pair respectively, Ni and Nj are the number of 
stacked an unstacked pairs and zq denotes the trivial con- 
tribution from translation and orientation of the whole 
strand. As discussed in Sectionllll B 11 the excluded vol- 



ume of nucleotides means that pairs of neighbours are 
not independent. To deal with this, we introduce two 
new parameters. The statistical weight of a continuous 
section of n stacked pairs is now given by: 



u{n) 



au w 



(B2) 



with X being equal to the number of bases in the run 
of stacked pairs that lie at the end of the strand, n 
unstacked pairs contribute the same statistical weight as 
before: 



v{n) 



(B3) 



completely unstacked strand: 



Ziir) 



x,r,p;l} J 



with ^{x.r,p;i} defined as the number of distinct config- 
urations of length I with r stacked pairs, of which x are 
at the end of the strand, divided between p contiguous 
regions of stacking. The advantage of this representation 
is that finding ^{x.r.p;i} is simply a matter of combina- 
torics. It can be shown that: 



n 



(l + (5f)(r- l)!(/-r-2)! 



(r - py.{p - -r-2-p + x)\{p - x)l ' 

(B7) 

for all possible values of x, r and p for a strand of length 
I, with the exception that ^{o,o,0;i} — ^{2,;-i,i;i} = 1- 

We assume that the temperature dependence of stack- 
ing is manifested in the parameter t, which is defined as 
t = exp{-Ah'*/RT + As''^/R), with A/i"* and As"* rep- 
resenting the (assumed constant) enthalpy and entropy 
changes associated with stack formation. As w and a 
arise from excluded volume effects, they are assumed to 
be entropic and hence temperature independent. We fit- 
ted this 4-parameter model to data obtained in simula- 
tions, the results are shown in Section[niB 1[ 



If a and w are unity, each neighbour pair is independent, 
and we return to Eqn. |B1| cr takes the role of a coopera- 
tivity parameter: for < u < 1, stacking is cooperative, 
in that configurations with multiple separate regions of 
stacking are disfavoured, and for a > I stacking is antico- 
operative. w accounts for end effects: for < w < 1, end 
bases are less likely to stack, and for w > 1 the opposite 
is true. 

Using these definitions, the total partition function for 
a strand of length I becomes: 



(B4) 



Here, {ni,rj;l} specifies a configuration, rii being the 
number of stacked pairs in the i^^ contiguous sequence of 
stacked neighbours, rrij being the number of unstacked 
pairs in the m^^ sequence of unstacked bases and x = 
^ j Xi is the total number of bases at the end of the strand 
involved in stacking. 

Defining t — u/v, n = rii and letting p be the total 
number of stacked regions, we obtain: 



Zi = Z] 



(B5) 



with Z" = zqv being the partition function of a com- 
pletely unstacked strand. To compare directly with sim- 
ulations, we require the ratio of the probability of ob- 
serving r stacked pairs to the probability of observing a 



Appendix C: Statistical model for duplex formation 

Eqn. [8] assumes a constant entropy and enthalpy 
difference between bound and unbound states. It is 
well known, however, that ASi and AHi should both 
become more negative with temperature, as the un- 
bound str ands beco me increasingly disordered due to 
unstacking.E^I^IIllIini Using the formalism of Appendix 
IBI we can factor out this effect: 



[A,B,] Zu _ cxp ( - li{AH[ - TAS[) ) (Z^) 



(CI) 

where in this case AH[ and AS*;' are the enthalpy and en- 
tropy difference between the duplex and unstacked single- 
stranded macrostates. 



Although fitting to Eqn. CI with constant AH[ and 
A^; was more successful than assuming constant AHi 
and AS"/, it overcorrected for the variations in ASi and 
AHi with temperature. The failure resulted from ne- 
glecting the changes in the bound state with tempera- 
ture, which were dominated by two effects: 

• As temperature increases, increased fraying leads 
to smaller entropy and enthalpy differences between 
typical bound states and completely unstacked sin- 
gle strands, as bound states become more disor- 
dered. 
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FIG. 17: A.H^{y) against duplex length I. Points are styled ac- 
cording to the total number of bonds formed, b = I — y. The solid 
lines are linear fits for the dependence of A_ff{'(j/) on I for fixed y. 



• Frayed ends themselves undergo a stacking tran- 
sition, once more resulting in the entropy and en- 
thalpy of bound states relative to unstacked strands 
becoming less negative with temperature. 

To incorporate these effects within a statistical model, 
we separately consider the entropy and enthalpy differ- 
ences between unstacked single strands and macrostates 
with y out of I possible base pairs formed. We then 
approximately adjust for stacking of the frayed ends by 
treating the 2{l — y) unpaired bases as undergoin g stack- 



ing with the same Ah and As'' given in Section |III B 1 
Cooperativity and end effects are ignored as it would be 
difficult to include them consistently when stacking is 
initiated adjacent to a duplex region. 
We thus define Zii{y) as: 



Zii = ^Zii{y), 



(C2) 



and Z'ii{y) as the contribution to Zii{y) in which none of 
the unpaired bases are stacked. Z^{y) is approximated 
by: 



Zuiy) = Zli{y){l + exp ( - /3(A/i^* - TAs^*)) 



2(i-y) 



Our hypothesis is that the enthalpy and entropy dif- 
ferences between unstacked single strands and the states 
contributing to Z]\{y) should be approximately constant 
for given I and y, as the temperature variation due to 
breaking stacks and fraying has been factored out. The 
values of Zn (y) / Zf were extracted from the fra ying data, 

and zriiy)/{zry 

ting to 



inferred using Eqns. B6 



and 



C3 



Fit- 



exp 



(-/?(Ai/,"(y)-rA50(z/)) 



(C4) 



with constant AH^{y) and ASi{y) (which represent the 
enthalpy and entropy differences between unstacked sin- 
gle strands and the states contributing to Z^iy)) was 
very successful. 



Furthermore, as shown in Fig. 17 AHi(y) (and 



(C3) 



ASi{y), which is not shown) are to an excellent approx- 
imation linear in I for fixed y. Thus, having factored out 
sources of variation with temperature in the initial and 
final states, we arrive at a statement similar to the ini- 
tial hypothesis of the nearest-neighbour model: adding 
an extra bp to a helix (i.e., increasing the length of the 
strands by one base, and forming one extra base pair, 
so that the number of unpaired bases is constant) con- 
tributes a constant enthalpy and entropy change relative 
to unstructured single strands. 

This finding suggests an extension of the nearest- 
neighbour model to non-two-state behaviour to incorpo- 
rate fraying and stacking, and thus predict the values 
of AS{T) and AH{T) for oligonucleotides. To achieve 
this description, fraying and stacking transitions must be 
sufficiently well characterized, and the assumption that 
helix stability is predominantly due to nearest-neighbour 
effects must hold, as it does in our model. It should be 
noted that at low temperatures, certain oligomers may 
also have significant contributions to the single-stranded 
state from hairpins, which are not incorporated into this 
model. 

We are finally in a position to characterize the hy- 
bridization transition with completely temperature in- 
dependent parameters. Combining Eqns. [8l |B6[[C2l|C3| 
and|C41 we find: 



. Zu E,cxp(-/3 Ai/,"(2;)-rA50(y) )(l + exp -/3(A/j^*-rAs«*) ) 
K,„ = exp {-p(AHi-TASi)] = = — ^ , ^ 

(C5) 

d ^ ^ {^H^jy) + 2(^ - y)Ah- , -gafAt^rJAC^l)) ) My) ^E^rAh^^ ^^^^ 

dp Zii Zi 
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Eqn. C6 is used in Section III B 4 to produce the fit of 



Ai7i5 to simulations. Tfie first term gives tfie enthalpy 
of duplexes with respect to unstacked single strands and 
the second term the enthalpy of two single strands with 
respect to their unstacked state. As can be seen, the 
agreement is good over a wide range of temperatures. 
Had hairpins been possible in the simulations of Sec- 



tion III B 4 they may have distorted the enthalpy at tem- 
peratures far below T^- Hairpins were excluded to make 
the interpretation of results clearer, and their presence 
would have made the single-stranded state's enthalpy 
more negative. This change would have led to a smaller 
transition enthalpy between single strands and duplexes 
at these temperatures. 



